Efficient language model adaptation through MDI estimation
نویسنده
چکیده
This paper presents a method for n-gram language model adaptation based on the principle of minimum discrimination information. A background language model is adapted to t constraints on its marginal distributions that are derived from new observed data. This work gives a di erent derivation of the model by Kneser et al. (1997) and extends its application to interpolated language models. The proposed method has been evaluated on an Italian 60K-word broadcast news task.
منابع مشابه
MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation
This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoo...
متن کاملConstraint selection for topic-based MDI adaptation of language models
This paper presents an unsupervised topic-based language model adaptation method which specializes the standard minimum information discrimination approach by identifying and combining topic-specific features. By acquiring a topic terminology from a thematically coherent corpus, language model adaptation is restrained to the sole probability re-estimation of n-grams ending with some topic-speci...
متن کاملTopic Adaptation for Lecture Translation through Bilingual Latent Semantic Models
This work presents a simplified approach to bilingual topic modeling for language model adaptation by combining text in the source and target language into very short documents and performing Probabilistic Latent Semantic Analysis (PLSA) during model training. During inference, documents containing only the source language can be used to infer a full topic-word distribution on all words in the ...
متن کاملDynamic language modeling for broadcast news
This paper describes some recent experiments on unsupervised language model adaptation for transcription of broadcast news data. In previous work, a framework for automatically selecting adaptation data using information retrieval techniques was proposed. This work extends the method and presents experimental results with unsupervised language model adaptation. Three primary aspects are conside...
متن کاملCrosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation
Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic modelor feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic modeland featurelevel adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in ou...
متن کامل